Animated Singapore Age-Sex pyramid from 2000 to 2020
In this take home exercise the task will be to design a age-sex pyramid visualization that shows the change of demographic structure of Singapore by age cohort and gender between 2000 to 2020 at the planning area level.
The challenges of this exercise is to show the additional variables in a clear way. Typically a age-sex pyramid would allow the reader to see the distribution of the population by age-group and sex. Animation could be added to show the changes in each group over the years, however the addition of presenting data at the planning area level presents difficulty due to numerous levels in the planning area variable.
The propose solution would be a modified Age-Sex pyramid, with the planning areas in the Y-axis instead of age group. Numbers in each age group are represented by points in the chart with shapes to differentiate between gender and colors to indicate the age-group. Animation will be used to show changes in the various demographics over the years. A sequential color scheme could be used for the age-groups. Using sequential colors make reading of the chart intuitive and require less mental load to keep track of the legend.
For this task following r packages are loaded: 1. tidyverse: For loading and wrangling of data. 2. gganimate: Use for animating the static plot. 3. plotly: Alternative animation package. Loading for exploration purpose. Plotly allows combination of using interactive tooltips and animation. 4. RColorBrewer: Is loaded to have additional color pallette available. 5. rmarkdown** will be used to visualize the table.
packages = c('tidyverse', 'plotly', 'gganimate', 'RColorBrewer', 'rmarkdown' )
for(p in packages){library
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Population data of Singapore from 2000 to 2020 was obtained from Singapore Department of Statistics.
The data is saved in csv format, dplyr read_csv method will be used. The data is stored in two separted files it would need to be loaded separately to two tibble objects.
pop2000to2010 <- read_csv("data/respopagesextod2000to2010.csv")
pop2011to2020 <- read_csv("data/respopagesextod2011to2020.csv")
The data is then stacked with a union function to combine population numbers from both periods.
pop2000to2021 <- union(pop2000to2010, pop2011to2020)
In the following code chunk the various age groups are grouped to their into broader bands of youth, adults, middle age, seniors and elderly according to their age group.
pop2000to2021 <- pop2000to2021 %>% mutate(Age_Groups = case_when(AG == "0_to_4" ~ "00 to 19: Youths",
AG == "5_to_9" ~ "00 to 19: Youths",
AG == "10_to_14" ~ "00 to 19: Youths",
AG == "15_to_19" ~ "00 to 19: Youths",
AG == "20_to_24" ~ "20 to 39: Adults",
AG == "25_to_29" ~ "20 to 39: Adults",
AG == "30_to_34" ~ "20 to 39: Adults",
AG == "35_to_39" ~ "20 to 39: Adults",
AG == "40_to_44" ~ "40 to 59: Middle Age",
AG == "45_to_49" ~ "40 to 59: Middle Age",
AG == "50_to_54" ~ "40 to 59: Middle Age",
AG == "55_to_59" ~ "40 to 59: Middle Age",
AG == "60_to_64" ~ "60 to 79: Seniors",
AG == "65_to_69" ~ "60 to 79: Seniors",
AG == "70_to_74" ~ "60 to 79: Seniors",
AG == "75_to_79" ~ "60 to 79: Seniors",
AG == "80_to_84" ~ "80 and above: Elderly",
AG == "85_to_89" ~ "80 and above: Elderly",
AG == "90_and_over" ~ "80 and above: Elderly",
TRUE ~ AG))
Contrast to previous exercise, as the planning area information is used, it is kept during the group by function to summarize population numbers in terms of year, planning area, age group and gender.
pop2021_grouped <- pop2000to2021 %>%
group_by(Time, PA, Age_Groups, Sex) %>%
summarise(n = sum(Pop)) %>%
ungroup()
To display the values for male population “below the axis”, the population values for all Male age groups is converted to negative values first. Such that when the chart is generated population values for males will be below the axis in a pyramid plot. Axis label is manually overwritten to replace negative values as suggested by found in this rpubs article
pop2021_grouped <- pop2021_grouped %>% mutate(Pop2 = case_when(Sex == "Males" ~ 0-n,
TRUE ~ n))
A final check on the data result prior to visualizing the data.
paged_table(pop2021_grouped)
Finally the data is visualized in the following code chuck with the steps listed below:
ggplot(pop2021_grouped, aes(x = PA, y = Pop2, color = Age_Groups, shape = Sex)) +
geom_point() +
scale_y_continuous(breaks = as.integer(seq(-50000, 50000, 25000)),
labels = as.integer(c(seq(50000,0, by = -25000), seq(25000, 50000, by = 25000)))) +
coord_flip() +
ylab("Population Numbers") +
xlab("Planning Area") +
ggtitle("Age-Sex Pyramid Singapore Year {as.integer(frame_time)}") +
theme(panel.background = element_rect(fill = "white",
color = "black",
linetype = "solid"),
panel.grid.major = element_line(size = 0.25,
linetype = 'solid',
colour = "lightgrey"),
panel.grid.minor = element_line(size = 0.10,
linetype = 'solid',
colour = "lightgrey"),
axis.text.x = element_text(size = 5),
axis.text.y = element_text(size = 5),
legend.position = "right") +
scale_color_brewer(palette = "Blues") +
transition_time(Time) +
ease_aes('linear')
Reviewing the visualization generated, the viewer could see the relative population size in each planning area. Growing number of population over the years are concentrated in Bedok, Jurong West, Hougang, Tampines and Woodlands. These areas are also where many of the younger population reside in. Smaller rates of growth was also observed in Novena and Tanglin. Other interesting observations from the visualization is the rapid growth of Punggol, Serangoon and Seng Kang over the years indicating development of this areas as residential areas.
For experimentation the native animation function from plotly was trialed below. Plotly allows to has both animation and tooltips for views to have detail look at the data. However, an attempt initially to keep typical Age-Sex pyramid format of having the Age groups in Y-Axis and colors for planning areas resulted in a very busy plot and required the viewer to toggle between the legend and the chart to keep track of the planning areas. It seem better to have planning area in the y axis for easier comparison at the same time reducing the color levels in the chart. For specific data points the view could use the tooltips to look at the individual values.
p <- ggplot(pop2021_grouped, aes(x = PA,
y = Pop2,
color = Age_Groups,
frame = Time)) +
geom_point() +
aes(shape = Sex) +
scale_y_continuous(breaks = as.integer(seq(-50000, 50000, 25000)),
labels = as.integer(c(seq(50000,0, by = -25000), seq(25000, 50000, by = 25000)))) +
coord_flip() +
ylab("Population Numbers") +
xlab("Planning Area") +
ggtitle("Singapore Age-Sex Pyramid (2000 - 2020)") +
theme(panel.background = element_rect(fill = "white",
color = "black",
linetype = "solid"),
panel.grid.major = element_line(size = 0.25,
linetype = 'solid',
colour = "lightgrey"),
panel.grid.minor = element_line(size = 0.10,
linetype = 'solid',
colour = "lightgrey"),
axis.text = element_text(size = 10)) +
scale_color_brewer(palette = "Blues")
ggplotly(p)